colmem: improve memory-limiting behavior of the accounting helpers #85440

yuzefovich · 2022-08-02T01:44:49Z

colmem: introduce a helper method when no memory limit should be applied

This commit is a pure mechanical change.

Release note: None

colmem: move some logic of capacity-limiting into the accounting helper

This commit moves the logic that was duplicated across each user of the
SetAccountingHelper into the helper itself. Clearly, this allows us to
de-duplicate some code, but it'll make it easier to refactor the code
which is done in the following commit.

Additionally, this commit makes a tiny change to make the resetting
behavior in the hash aggregator more precise.

Release note: None

colmem: improve memory-limiting behavior of the accounting helpers

This commit fixes an oversight in how we are allocating batches of the
"dynamic" capacity. We have two related ways for reallocating batches,
and both of them work by growing the capacity of the batch until the
memory limit is exceeded, and then the batch would be reused until the
end of the query execution. This is a reasonable heuristic under the
assumption that all tuples in the data stream are roughly equal in size,
but this might not be the case.

In particular, consider an example when 10k small rows of 1KiB are
followed by 10k large rows of 1MiB. According to our heuristic, we
happily grow the batch until 1024 in capacity, and then we do not shrink
the capacity of that batch, so once the large rows start appearing, we
put 1GiB worth of data into a single batch, significantly exceeding our
memory limit (usually 64MiB with the default workmem setting).

This commit introduces a new heuristic as follows:

the first time a batch exceeds the memory limit, its capacity is memorized,
and from now on that capacity will determine the upper bound on the
capacities of the batches allocated through the helper;
if at any point in time a batch exceeds the memory limit by at least a
factor of two, then that batch is discarded, and the capacity will never
exceed half of the capacity of the discarded batch;
if the memory limit is not reached, then the behavior of the dynamic growth
of the capacity provided by Allocator.ResetMaybeReallocate is still
applicable (i.e. the capacities will grow exponentially until
coldata.BatchSize()).

Note that this heuristic does not have an ability to grow the maximum
capacity once it's been set although it might make sense to do so (say,
if after shrinking the capacity, the next five times we see that the
batch is using less than half of the memory limit). This is an conscious
omission since I want this change to be backported, and never growing
seems like a safer choice. Thus, this improvement is left as a TODO.

Also, we still might create batches that are too large in memory
footprint in those places that don't use the SetAccountingHelper (e.g.
in the columnarizer) since we perform the memory limit check at the
batch granularity. However, this commit improves things there so that we
don't reuse that batch on the next iteration and will use half of the
capacity on the next iteration.

Fixes: #76464.

Release note (bug fix): CockroachDB now more precisely respects the
distsql_workmem setting which improves the stability of each node and
makes OOMs less likely.

colmem: unexport Allocator.ResetMaybeReallocate

This commit is a mechanical change to unexport
Allocator.ResetMaybeReallocate so that the users would be forced to use
the method with the same name from the helpers. This required splitting
off the tests into two files.

Release note: None

cockroach-teamcity · 2022-08-02T01:44:57Z

This change is

DrewKimball

Nice refactoring/stability improvements!

Reviewed 7 of 7 files at r1, 8 of 8 files at r2, 27 of 27 files at r3, 4 of 4 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)

pkg/sql/colfetcher/cfetcher.go line 852 at r2 (raw file):

			if cf.machine.limitHint > 0 && cf.machine.rowIdx >= cf.machine.limitHint {
				// If we made it to our limit hint, so output our batch early to

[nit] "If we" -> "We"

pkg/sql/colfetcher/vectorized_batch_size_test.go line 153 at r3 (raw file):

	for _, tc := range []struct {
		// numRows and rowSize must be of the same length, with each index
		// specifying the number of rows of the corresponding size to be

[nit] add that the row size is in bytes.

pkg/sql/colmem/allocator.go line 567 at r3 (raw file):

// Init initializes the helper.
func (h *AccountingHelper) Init(allocator *Allocator, memoryLimit int64) {

[nit] Maybe mention here that the allocator can be shared between different components.

pkg/sql/colmem/allocator.go line 573 at r3 (raw file):

		// disk spilling" scenario, but the helper should ignore that, so we
		// override it to the default value of the distsql_workmem variable.
		memoryLimit = 64 << 20 /* 64 MiB */

Would it be better to use GetWorkMemLimit here? I'm wondering if there could be a case where a user sets the working memory limit to 1 for whatever reason, then gets unexpected results when we revert to the default.

pkg/sql/colmem/allocator.go line 584 at r3 (raw file):

// yet to be set, use 0 if unknown.
func (h *AccountingHelper) ResetMaybeReallocate(
	typs []*types.T, oldBatch coldata.Batch, remainingTuples int,

[nit] remainingTuples confused me a bit when I saw this method being called elsewhere; maybe something like tuplesToBeSet or remainingTuplesToSet would be clearer?

pkg/sql/colmem/allocator.go line 591 at r3 (raw file):

		// with other components, thus, we cannot ask it directly for the batch
		// mem size, yet the allocator can provide a useful upper bound.)
		if batchMemSizeUpperBound := h.allocator.Used(); h.discardBatch(batchMemSizeUpperBound) {

Shouldn't we also be setting h.maxCapacity to the current batch capacity once we exceed the memory limit? (Rather than only setting it to half the current capacity once we reach double the limit).

pkg/sql/colmem/allocator.go line 719 at r3 (raw file):

// yet to be set, use 0 if unknown.
func (h *SetAccountingHelper) ResetMaybeReallocate(
	typs []*types.T, oldBatch coldata.Batch, remainingTuples int,

[nit] same as for AccountingHelper.

pkg/sql/colmem/allocator.go line 821 at r3 (raw file):

		// we update the memorized capacity. If it's the latter, then on the
		// following call to ResetMaybeReallocate, the batch will be discarded.
		h.helper.maxCapacity = rowIdx + 1

Should we be doing this in the AccountingHelper instead?

pkg/sql/colmem/allocator.go line 713 at r4 (raw file):

// ResetMaybeReallocate is a light wrapper on top of
// Allocator.resetMaybeReallocate (and thus has the same contract) with an

[nit] This no longer has exactly the same contract.

pkg/sql/colmem/allocator_test.go line 412 at r3 (raw file):

				// Now the limit has been exceeded by too much again - a new
				// batch of smaller capacity must be allocated.
				{1, true, 100},

Could you add one last iteration like {1, false, 1300} to ensure we never attempt to shrink the capacity to zero?

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)

pkg/sql/colfetcher/cfetcher.go line 852 at r2 (raw file):